Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🧠 LLM Inference
Quantization, Attention Mechanisms, Batch Processing, KV Caching
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
32515
posts in
12.2
ms
The
Architecture
Behind Open-Source LLMs
blog.bytebytego.com
·
12h
🏗️
LLM Infrastructure
Reasoning-Driven
Multimodal
LLM for Domain
Generalization
arxiv.org
·
1d
✨
Gemini
Structured
Outputs
for LLMs
ternarysearch.blogspot.com
·
22h
·
Discuss:
Hacker News
,
ternarysearch.blogspot.com
🦙
Ollama
Generative
Bayesian
Inference with
GANs
jmlr.org
·
17h
📦
Batch Embeddings
pplx-embed
: State-of-the-Art Embedding Models for Web-Scale Retrieval
research.perplexity.ai
·
13h
📌
Embedding Retrieval
Ask HN:
Statistical
learning and
non-Statistical
learning for
humans
news.ycombinator.com
·
15h
·
Discuss:
Hacker News
📊
Statistical Ranking
Tabular
representation
learning
breno.bearblog.dev
·
16h
🎨
Chroma
SLA-Aware
Distributed LLM Inference Across
Device-RAN-Cloud
arxiv.org
·
1d
🏗️
LLM Infrastructure
UQLM
: A Python Package for Uncertainty
Quantification
in Large Language Models
jmlr.org
·
17h
🦙
Ollama
Efficient and
Portable
Mixture-of-Experts
Communication
research.perplexity.ai
·
8h
🧠
Inference Serving
Right-sizes
LLM models to your system's RAM,
CPU
, and GPU
news.ycombinator.com
·
22h
·
Discuss:
Hacker News
🏗️
LLM Infrastructure
Explaining
undesirable
model behavior: (How) can influence functions help?
lesswrong.com
·
17h
🛡️
AI Security
Your
AGENTS.md
is a
Liability
paddo.dev
·
6h
🪄
Prompt Engineering
Google AI Introduces STATIC: A Sparse Matrix Framework
Delivering
948x Faster
Constrained
Decoding for LLM Based Generative Retrieval
marktechpost.com
·
1d
🔤
Tokenization
Inception
Labs
says its diffusion LLM is 10x faster than Claude, ChatGPT, Gemini
thenewstack.io
·
7h
🏗️
LLM Infrastructure
Optimise
AI
mason.bearblog.dev
·
1d
📱
Edge AI Optimization
Understanding
Rope
: From
Rotary
Embeddings to Context Extension
mli0603.notion.site
·
19h
·
Discuss:
Hacker News
✖️
Cross-encoders
Optimal Heterogeneous Memory Configs for AI Tasks Under
Specified
Performance Metrics (Stanford,
UCSC
)
semiengineering.com
·
1d
🧠
Memory Hierarchy Design
Annotated
Source
ashish01.github.io
·
1d
🔤
Tokenization
Qwen3.5-4B-GGUF
is here!
huggingface.co
·
15h
·
Discuss:
r/LocalLLaMA
🏗️
LLM Infrastructure
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help